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Abstract 

Background: The coronavirus 2019 (COVID-19) pandemic has been spread¬ 
ing globally for months, yet the infection fatality ratio of the disease is still 
uncertain. This is partly because of inconsistencies in testing and death 
reporting standards across countries. Our purpose is to provide accurate es¬ 
timates which do not rely on testing and death count data directly but only 
use population level statistics. 

Methods: We collected demographic and death records data from the Ital¬ 
ian Institute of Statistics. We focus on the area in Italy that experienced 
the initial outbreak of COVID-19 and estimated a Bayesian model fitting 
age-stratified mortality data from 2020 and previous years. We also assessed 
the sensitivity of our estimates to alternative assumptions on the proportion 
of population infected. 

Findings: We estimate an overall infection fatality rate of 1-29% (95% cred¬ 
ible interval [CrI] 0-89 — 2-01), as well as large differences by age, with a low 
infection fatality rate of 0-05% for under 60 year old (CrI 0 — 0-19) and a 
substantially higher 4-25% (CrI 3-01 — 6-39) for people above 60 years of age. 
In our sensitivity analysis, we found that even under extreme assumptions, 
our method delivered useful information. For instance, even if only 10% of 
the population were infected, the infection fatality rate would not rise above 
0-2% for people under 60. 

Interpretation: Our empirical estimates based on population level data 
show a sharp difference in fatality rates between young and old people and 
firmly rule out overall fatality ratios below 0-5% in populations with more 
than 30% over 60 years old. 
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1. Introduction 

Estimating the severity of the coronavirus disease 2019 (COVID-19), caused 
by the novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) 
across demographic groups is of medical interest, but will also be crucial in 
the design of health and economic policies to deal with the next phases of 
the pandemic. 

Several approaches to measure lethality have been proposed since the start 
of the outbreak* 1 * 2 * 3 * 4 * ^, with most of the relevant population infected. Given 
the mounting evidence that a substantial proportion of infected people are 
either asymptomatic or display very mild symptoms^ 6 *, as well as the diffi¬ 
culties many countries are encountering in ramping up testing, it has been 
difficult to obtain precise estimates of the total number of infected. Testing 
mainly symptomatic cases on the basis of clinical studies on the symptoms 
of COVID-ig*™™ as announced for instance by the Italian government, 
might also have led to underestimating the total number of infected. While 
measurement of deaths is reliable, statistically significant deviations in total 
deaths relative to previous years have been observed in the most affected 
areas, leading to concerns that official COVID-19 death counts might also 
be underestimated in some cases^ 2 . Together, those hurdles make estimating 
the true infection fatality rate challenging. 

To sidestep some of those issues, in this article we used an empirical ap¬ 
proach employing publicly available aggregate deaths and demographic data 
to obtain infection fatality ratio estimates without relying on official data on 

COVID-19 positive cases and deaths. The key observation is that, assuming 
an accurate measurement of fatalities in a population, infection fatality ratio 
estimates are less strongly dependent on accurate measurement of total cases 
when the share of population infected is larger: keeping the number of deaths 
fixed, the estimate changes much less when the fraction of the population in¬ 
fected varies from 40% to 60% than when it changes from 2% to 3%, simply 
because ratios are nonlinear. 

We were therefore able to obtain precise fatality estimates by age range 
focusing on one of the hardest hit areas in Lombardy, which was placed 
under lock-down order already on February 21st 2020. This area includes 
ten towns and has a population of around 50 thousands people. The first 
recorded patient infected through community spread in Italy was admitted 
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to intensive care on February 20th and the area already had 36 confirmed 
cases the day after 13 . While widespread randomized antibody testing hasn’t 
yet been conducted in this area, 30% of a sample of blood donors from all 
ten municipalities tested positive to antibody-^, and a smaller sample of 60 
asymptomatic blood donors in one of the towns under lock-down showed 40 
(66%) positive cases 15 . Although these samples might not be fully represen¬ 
tative of the population of these areas, these figures are highly suggestive of 
widespread contagion. 

2. Methodology 

2.1. Data 

We focused on ten Italian municipalities in Lombardy that experienced the 
initial outbreak of COVID-19. Data on deaths has been collected from the 
Italian Institute of Statistics (ISTAT) 16 . We built estimates of total death 
counts based on daily data on recorded deaths for 2020 and previous years 
for the period 2015-2019 that ISTAT collects from the Anagrafe Nazionale 
della Popolazione Residente (National Census of Resident Population). The 
data contains information about gender and age group (in 5-years bins) for 
each recorded death until April 4th 2020. ISTAT has released data for only 
seven of the ten municipalities that experienced the initial outbreak. Hence, 
we focused our analysis on this subsample. The excluded towns are much 
smaller than the others: the total population of those three towns in 2019 
was 3,543, while the remaining seven municipalities had a total population 
of 47,020. 

Death counts data has been complemented with information provided by IS¬ 
TAT on the demographics of each municipality^. For every city we collected 
total population by age range in each year from 2015 to 2019. We used year 
2019 information for year 2020 since data on 2020 has not been released yet. 

2.2. Comparing 2020 deaths to previous years 

In order to motivate the use of administrative death counts for our infection 
fatality rate estimation, we begin by analysing patterns in overall mortality 
in 2020 and previous years. In particular, on every day between February 
21st (the beginning of the outbreak) and April 4th the difference between 
the number of 2020 deaths and the 2015-2019 average has been computed for 
total population and for different age groups. We compared total deaths to 
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average number and fluctuations in deaths in past years to assess the signal 
to noise ratio of this measure. 

2.3. Bayesian Estimation of COVID-19 infection fatality rate 
We employ a Bayesian framework to estimate the infection fatality rate of 
COVID-19 by adapting a standard binomial mortality model to our setting 18 . 
The likelihood function of the model is obtained by assuming deaths in the 
period between February 28th and April 4th of each year are binomially 
distributed according to: 

A,a,y ~ Binomial (<5 0 • 9 h N^y) for y e {2015,..., 2019} (1) 

A,a ,2020 ~ Binomial (<5 a + <5^ ovld • 9i, A,a, 2020 ) (2) 

where i denotes the municipality, y the year, and a the age range. We used 
seven age ranges: 0 — 20, 21 — 40, 41 — 50, 51 — 60, 61 — 70, 71 — 80, 81+. 
A, a,y an d Ni, a , y are the total deaths and population in town i, year y and 
age range a, respectively. The baseline lethality rates S a are heterogeneous 
across age ranges, but were assumed to be constant across municipalities and 
years. 

fiComd infection fatality rate for age range a and we assumed S^° md = 0 

in every year before 2020 when COVID-19 was not present. We also assumed 
that infection rates 9 t are heterogeneous across municipalities but constant 
across age groups. 

We assumed the following priors for the parameters of interest: 


5 a ~ Uniform [0,0-1] 

( 3 ) 

tfCovid _ Uniform [0,0-3] 

( 4 ) 

6i ~ Beta(3,2). 

( 5 ) 


Priors on baseline and COVID-19 death rates were chosen to be uninfor¬ 
mative, while we chose the prior on infection rates to reflect the results of 
the antibody testing in one of the municipalities^, while at the same time 
maintaining a weakly informative prior. 

We implemented a Bayesian procedure to derive point estimates and credible 
intervals for the infection fatality rates. The model was estimated using 
Markov Chain Monte Carlo (MCMC). We calculated the median and 95% 
credible interval using to the quantiles of the posterior distribution for all 
parameters. To check the sensitivity of our estimates we also calculated point 
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estimates as the mode of the posterior distribution and confidence intervals 
as 95% highest posterior density interval 19 . 

We fitted our model using R version 3.6.2. We drew 100,000 samples from 
the joint posterior distribution and used 10 independent chains, discarding 
the first 1000 samples for each chain. Trace plots of the Markov Chain 
Monte Carlo as well as posterior distributions for each variable are reported 


in Appendix A All analyses are fully reproducible with the code available 
online. 

Because our Bayesian procedures relies on modelling assumptions to derive 
the infected portion of the population, we also implemented a more agnostic 
approach, showing how infection fatality rates vary by contagion rate. I 11 this 
exercise we computed the infection fatality rates from a simplified version of 
the model above in which we set a degenerate prior for each of the 9i to be 
equal to a constant in the interval [0T, 1] and estimated the model for each 
choice. 


Role of the Funding Source 

The funders had no role in study design, data collection, data analysis, data 
interpretation, or writing of the report. All authors had full access to all 
data in the study and had final responsibility for the decision to submit for 
publication. 

3. Results 

3.1. Raw Deaths Counts 

We documented a substantial increase in total deaths at the beginning of the 
outbreak. Figure [T| shows the total daily deaths counts in the seven munici¬ 
palities for 2020 and 2015-2019 (average) in the period between Jan 1st and 
April 4th. The year 2020 and previous years are very similar preceding the 
last week of February, when the first COVID-19 cases have been discovered. 
Starting that week, we observed a spike in the number of deaths. This spike 
is clearly related to COVID-19 as it starts at the beginning of the outbreak 
and it significantly overcomes average fluctuations in deaths observed in pre¬ 
vious years. In total, deaths over the period from February 21st to April 4th 
in 2020 were almost five times the average in previous years over the same 
period of time (341 vs 70). 
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3.2. Infection Fatality Ratio Estimates 

Using our Bayesian model we estimated an overall infection fatality ratio of 
1-29% (95% credible interval [CrI] 0-89 —2-01). We also uncovered substantial 
heterogeneity by age. For under 60 years old the infection fatality rate was 
0-05% (CrI 0 - 0-19, Table §, while for over 60 years old it was 4-25% (CrI 

3- 01-6-39, Table 

Figure [2] shows the estimated infection fatality ratios by age group together 
with 95% credible intervals and interquartile range. As expected, infection fa¬ 
tality ratios were found much larger for older age groups. Point estimates are 

4- 66 and 9-04 for 71-80 and 81+, respectively. We cannot exclude, however, 
that the infection fatality ratio for over 80 years old is as high as 13-3% or as 
low as 6-61%. Interestingly, we found an infection fatality rate close to zero 
for under 50 years old, and around 0T% for the 51-60 group. For robustness, 
we have also recalculated all point estimates using the mode and confidence 
intervals as the highest posterior density interval. Results remained virtually 
identical (data not shown). 

Estimated infections rates were also heterogeneous by town, ranging between 
21% and 79-5% (Table [ 2 ]). Interestingly, Castiglione d’Adda, where antibody 
tests conducted on a sample of individuals detected a 66-6% infection rate, 
resulted as the municipality with the largest share of the population infected 
(79-51%). We estimate a population weighted overall infection rate for the 
seven towns of 40-5%, (CrI 25% — 58%). This is broadly consistent with a 
recent study on blood donors for the entire area^- 14 has found a 30% overall 
infection rate. 

We finally performed an exercise to assess how sensitive our infection fatality 
rate estimates are to different levels of contagion (Figure [3]). Focusing on a 
large range of potential infection rates, we found that even in the conservative 
assumption that only 15% of the population was infected, under 60 years old 
still experienced an infection fatality rate significantly below 1%, with under 
40 being around 0-1%. Obviously, estimates spiked for older age groups as 
we set the infection rate very low. These results confirm the view that the 
infection has low lethality rates for younger individuals, but large rates for the 
ciders. Moreover, this exercise showed that the overall infection fatality rate 
was significantly above zero and around 0-5% even in the most conservative 
assumption of 100% contagion. 
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4. Discussion 

In this paper we estimated the infection fatality rate of COVID-19 from 
administrative death counts on seven Italian municipalities that experienced 
the first outbreak of this desease in late February 2020. We found an overall 
infection fatality rate of 1-29% (CrI 0-89 — 2-01). We uncovered significant 
heterogeneity across age groups. Under 60 years old have infection fatality 
ratios around 0-05% (CrI 0-00 — 0-19). On the contrary, older people are at 
significantly larger risk: over-60 infection fatality ratio is 4-25% (CrI 3-01 — 
6-4), and for over-80 it is estimated at 9-04% (CrI 6-61 — 13-30). Finally, we 
excluded very low fatality rates even under very conservative assumptions on 
the population infection rate: if the entire population had been infected, the 
overall infection fatality ratio would still be around 0-5%. 

Our result for overall infection fatality ratio is larger than estimates in 20 
based on travellers’ data. It is, however, remarkably close to estimates for 
two case studies where the entire population was tested: Diamond Princess 
Cruise (1-3%) 2 ^, and Vo’ Euganeo (1%) - a 3,000 inhabitants municipality in 
the Italian Veneto region. Despite the similarity there are two main reasons 
to expect differences between our estimates and those in the two mentioned 
case studies. First, we estimate a large infection fatality ratio for over 80 years 
old who are likely underrepresented in the Diamond Princess Cruise. Second, 
total population in both Diamond Princess Cruise and Vo’ Euganeo is limited 
and for this reason those estimates might be affected by under-sampling of 
the number of infected in the tails. These small differences notwithstanding, 
all these figures are substantially lower than estimated case fatality rates 
(CFRs) computed with official contagion data and that have previously been 
estimated for influenza pandemics®®. The likely reason is that COVID-19 case 
counts are subject to downward biases due to limited testing capacity and 
testing strategies that prioritize symptomatic cases despite a large number 
of asymptomatic patients. 

Our estimates could suffer from the fact that deaths data is missing for the 
period after April 4th 2020. This would be an issue if the contagion had not 
stopped by that date and therefore useful information on deaths could not be 
used in our model. However, as the trend in number of deaths suggests, the 
contagion likely stopped by April 4th in these seven municipalities. Indeed, 
the number of deaths in the last days of our sample went back to the average 
number in the previous five years. For this reason, there are reasons to 
believe that data for the following weeks would not be very informative on 
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the lethality of the first wave of contagion. 

One limitation of our estimates is that they should not be taken at face value 
in the analysis of contexts where the hospital system is under stress or close 
to capacity. This is because our exercise was performed on a case study 
at the beginning of the Italian outbreak when the hospital system was still 
fully functioning. Moreover, the quarantine measures implemented in the 
seven municipalities on February 21st likely reduced contagion, potentially 
affecting infection fatality ratios and making them hard to extrapolate to 
context where similar measures were not undertaken. 

Another limitation is that our model assumed a constant baseline lethality 
rate in absence of COVID-19. This implies that the COVID-19 outbreak 
did not change the baseline death rate in the population. Plausibly, the 
lockdown policy decreased deaths from, among others, violence and traffic, 
while at the same time the outbreak could have increased other fatalities due 
to lower availability of healthcare resources for other diseases. Fluctuations 
due to these causes are, however, likely to be quantitatively small compared 
to the large spike in deaths that we observed in 2020, where total deaths 
where almost five times the average in previous years over the same period 
of time (341 vs 70). For this reason, confounding factors in baseline deaths 
levels should not have substantially affected the signal to noise ratio of our 
death data. 

In conclusion, our results support the need for isolating policies especially 
for the elder part of the population given the extremely high fatality rates 
from COVID-19 we estimated. 
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List of Tables 


Table 1: Total population and deaths by age range 


Age Range 


Total Population 




Total Deaths 



2015 

2016 

2017 

2018 

2019 

2015 

2016 

2017 

2018 

2019 

2020 

0-20 

8,046 

8,039 

8,012 

8,035 

8,039 

0 

0 

0 

0 

0 

1 

21-40 

10,202 

10,058 

9,851 

9,729 

9,711 

0 

0 

1 

0 

0 

0 

41-50 

7,984 

7,832 

7,629 

7, 461 

7,225 

1 

2 

0 

0 

2 

2 

51-60 

6,922 

7,133 

7,284 

7, 484 

7, 599 

3 

2 

5 

2 

5 

5 

61-70 

5,723 

5,704 

5,707 

5,680 

5,681 

4 

5 

4 

7 

6 

29 

71-80 

4,733 

4,760 

4,843 

4,857 

4,903 

13 

10 

18 

17 

10 

107 

81+ 

3,412 

3,453 

3,573 

3,669 

3,849 

57 

35 

42 

47 

56 

197 

Overall 

47, 022 

46, 979 

46, 899 

46,915 

47, 007 

78 

54 

70 

73 

79 

341 


This table reports descriptive statistics by age range on total population for 2015-2019, and deaths for 2015-2020. Data 
refers to the period from January 1st to April 4th. 
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Table 2: Model Estimates 



Estimate 

CrI 

Infection Fatality Ratios 


sCovid 

^Overall 

1-2855 

[0-8878; 2-0110] 

SiCovid 

°<60 

0-0524 

[0-0041; 0-1887] 

SiCovid 

°61+ 

4-2509 

[3-0132; 6-3937] 

SiCovid 

°0-20 

0-0490 

[0-0048; 0-1775] 

SiCovid 

°21-40 

0-0176 

[0-0008; 0-0952] 

SiCovid 

°41-50 

0-0476 

[0-0020; 0-2007] 

SiCovid 

°51-60 

0-1076 

[0-0096; 0-3138] 

SiCovid 

°61-70 

1-0280 

[0-5876; 1-7380] 

SiCovid 

°71-80 

4-6620 

[3-3220; 6-9930] 

SiCovid 

°81+ 

9-0400 

[6-6180; 13-3000] 

Baseline Death Rates 



<5(1-20 

0-0016 

[0-0000; 0-0085] 

<521—40 

0-0028 

[0-0004; 0-0093] 

<541-50 

0-0152 

[0-0060; 0-0299] 

<551—60 

0-0452 

[0-0276; 0-0691] 

<561-70 

0-0935 

[0-0624; 0-1331] 

<571—80 

0-2830 

[0-2220; 0-3550] 

<581+ 

1-3290 

[1-1680; 1-5030] 

Infection Rates 



^Casalpusterlengo 

21-03 

[12-38; 32-48] 

^Castiglione d’Adda 

79-51 

[54-23; 96-23] 

^Codogno 

44-52 

[29-37; 62-76] 

^Fombio 

57-54 

[31-12; 87-30] 

^Maleo 

69-32 

[44-21; 92-53] 

^San Fiorano 

43-64 

[20-38; 75-97] 

^Somaglia 

21-86 

[9-02; 42-50] 


This table reports posterior median estimates from the model described in Section iMl For each parameter the last column 
reports the 95% credible interval. 
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List of Figures 


Figure 1: Daily deaths in 2020 and previous years average 



The black solid line reports the total deaths by day in 2020 in the seven municipalities, while the blue 
dashed line reports the average fatalities on the same day of the previous five years. The horizontal red 
line marks February 20th 2020, when the first patient was admitted to the ICU in this area. 
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Figure 2: Estimates of the infection fatality rate by age range 



For each age range a on the horizontal axis, the figure reports information on the posterior distribution 
of the COVID-19 infection fatality rate 5^ ovld from estimating the model described in Section 2.3 Boxes 
represent estimated interquartile ranges, with the central line reporting the median posterior estimate, 
vertical lines represent 1-5 times the interquartile range. 
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Figure 3: Estimates of infection fatality rate by age and proportion of population infected 
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For each proportion of the population affected between 10% and 80% on the horizontal axis, we estimate 
a restricted version of the model described in Section |2.3| in which the infection rate 0i is set equal for 
each municipality at that value. Solid lines represent the median of each posterior while the surrounding 
bands report 95% credible intervals from the 2-5 to the 97-5 percentiles of the posterior distributions. The 
Overall estimate is obtained by weighting posteriors by population shares in 2019. 
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Appendix A. Markov Chain Monte Carlo Estimation diagnostics 


Figure Appendix A.l: Trace and density plots for MCMC posteriors of § Cov ' ld for each 
age range 



For each of the seven age ranges a, this figure reports diagnostic plots for the MCMC 
simulation of the model describe in Section [273] for the COVID-19 infection mortality rate 
parameter: 6^ odlv . The left panels report trace plots of the last 5000 draws from the 
posterior to check convergence. The right panels report the corresponding posterior dis¬ 
tribution estimate (black solid line) together with the prior distribution for that parameter 
(red solid line). The % overlap reported in red is the PPO (prior posterior overlap). 
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Figure Appendix A.2: Trace and density plots for MCMC posteriors of S for each age 
range 



For each of the seven age ranges a, this figure reports diagnostic plots for the MCMC 
simulation of the model describe in Section [2.3| for the baseline mortality rate parameter: 
5 a . The left panels report trace plots of the last 5000 draws from the posterior to check 
convergence. The right panels report the corresponding posterior distribution estimate 
(black solid line) together with the prior distribution for that parameter (red solid line). 
The % overlap reported in red is the PPO (prior posterior overlap). 
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Figure Appendix A. 3: Trace and density plots for MCMC posteriors of 0, for each 
municipality 



For each of the seven municipalities i, this figure reports diagnostic plots for the MCMC 
simulation of the model describe in Section [2.3| on the baseline mortality rate parameter: 
6i. The left panels report trace plots of the last 5000 draws from the posterior to check 
convergence. The right panels report the corresponding posterior distribution estimate 
(black solid line) together with the prior distribution for that parameter (red solid line). 
The % overlap reported in red is the PPO (prior posterior overlap). 
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